Hao Zhou,
University of Michigan, zhouhao@umich.edu [PRIMARY contact]
Anna Shaverdian, University of Michigan, annaas@umich.edu [PRIMARY contact]
H.V. Jagadish, University of Michigan, jag@umich.edu [Faculty advisor]
George Michailidis, University of Michigan, gmichail@umich.edu [Faculty advisor]
Our analysis is based on the Visual Analytic Algebra presented in [1]. This algebra provides a structured framework for visual analytics and is based on a number of operators, including selection and aggregation. We found that Cytoscape [2], a popular open source tool for biologists to visualize interaction networks, offers most of the desired functionality required by our proposed algebra to guide us in the identification of the underlying social structure.
Further, a number of the available plugins, together with the ability for one to implement in Java new ones, provide us with the required enhanced functionality. Although Cytoscape constitutes the main tool, a number of additional tools and methods, such as Java, R and Microsoft Office, were also employed for validation purposes and further analysis of the data set.
.
Video:
ANSWERS:
MC2.1:
Which of the two social structures, A or B, most closely match the scenario you
have identified in the data?
A
MC2.2:
Provide the social network structure you have identified as a tab delimitated
file. It should contain the employee, one or more handler, any middle folks,
and the localized leader with their international contacts. What are the
Flitter names of the persons involved? Please identify only key connections
(not all single links for example) as well as any other nodes related to the
scenario (if any) you may have discovered that were not described in the two
scenarios A and B above.
MC2.3:
Characterize the difference between your social network and the closest social
structure you selected (A or B). If you include extra nodes please explain how
they fit in to your
The social network we identified above perfectly matches
hypothesis A.
We
begin by exploring which hypothesis is more likely. This task is trying to identify a
subgraph. There exist plugins to
identify subgraphs, but there are several uncertainties existing in each
hypothesis. These uncertainties require
a more investigative series of eliminations to identify the criminal employee
and his network.
To
add to our comprehension of the network, we use the Network Analysis plugin to
compute several network statistics.
Network Analysis saves the node metrics as node attributes. We study the node degree distribution chart
to determine how to proceed in identifying the criminal network. Based on this chart, shown in Figure 1, if we
start by filtering employees we might find a solution faster because there are
fewer nodes at this degree range than the middle men degree range.
Figure 1: Hairball picture of network. Showing several Cytoscape features.
First,
we attempt to match social structure A.
We select all nodes with degree around 40 to collect our initial
employee suspect list. Using the ESP
Plugin, we enter in its simple query language: “Degree: {35 TO 45}” to select
and display these nodes in Cytoscape’s Data Panel. In the Data Panel, we specify which
attributes to display and order the nodes by a selected attribute. At this stage there are too many possible
employees to visually identify the social structure. So we continue to filter this list by
applying more of social structure A’s information.
Next
we identify the handlers: suspect employee neighbors with degrees between 30 to
40. In Cytoscape this task requires
three operations. First we select the
neighbors of the suspect employees with a plugin we wrote, Neighbor Node
Select. We save this list of nodes into
its own network view. In the Control
Panel, Cytoscape saves and lists several networks within one Cytoscape
session. This history mechanism is
useful for returning to different manipulations of a network. Next, we use ESP to select nodes which have a
degree 30 to 40. We save this list of
nodes as our initial handlers.
We
refine the current employee and handler lists by filtering employees with less
than three handlers. To do this we
create a sub network of the current employee and handler nodes from the
original graph by selecting these nodes from their saved lists. Using this sub network, we filter employees
which have less than degree 3. This
operation results in a refined suspect employees and handlers list. But there are still too many nodes to
visualize the crime network. So we
continue filtering.
To
complete social structure A, we find the possible Boris list. We select handler neighbors with degree 4 to
5. We see that only degree 5 Borises
exist. From the hypothesis description,
we know 3 handlers have a Boris in common.
We use this to refine the current employee, handler, and Boris lists. We have reduced the number of possible
networks but we continue to apply more information from social structure A to
pick a criminal network with high confidence.
The
final elimination we apply is that Boris must have a neighbor with degree
greater than 100: the fearless leader.
We can eliminate some Borises this way by ignoring the navy blue nodes
without a white node neighbor. We save
the final Employee, Boris, Handler, and Fearless leader lists. From the original graph, we select these
nodes and produce a new sub network. We
use our saved lists to extract the sub network including the employee, Boris,
handlers, and fearless leader nodes.
This network represents all possible criminal employee networks with
social structure A.
At
this stage we can finally observe the most likely social structure A network
from the limited possibilities. For
example, we know handlers do not communicate; therefore, node 171 is unlikely
the criminal employee. This process
helps us produce a solution, centered at employee node 100, which perfectly
fits social structure A. We complete the
network by identifying the International Contacts of the Leader. The fearless leader has a broad network of
14 international contacts and is located in a larger city. This extra information adds to our confidence
that node 100 is the criminal employee.
Figure 2 shows a few of the finals networks we analyze.
Figure 2: Possible Borises and different networks we
analyzed.
The
other possibilities available might be more networks that the Embassy may wish
to investigate. We discover other sub-networks similar to social structure A and B. But, we have lower confidence in these sub
networks because of their conflicts with the descriptions: incorrect degrees or
locations or handlers communications.
For example, the following sub-network is similar to
hypothesis A, except Boris here has a degree of 6:
ID |
Role |
Filter Name |
142 |
Employee |
@lafouge |
38 |
Handler |
@krintz |
101 |
Handler |
@lonning |
318 |
Handler |
@formenti |
4980 (degree = 6) |
Middleman |
@rosch |
11 |
Fearless
Leader |
@cornell |
Also, there exist sub-networks similar to hypothesis B
with the conflict being the middlemen degree.
Hypothesis B states the middlemen have a degree within 3-4. Here we show one example:
ID |
Role |
Filter Name |
224 |
Employee |
@usdin |
31 |
Handler
1 |
@degtyarev |
19 |
Handler
2 |
@marrevee |
79 |
Handler
3 |
@terekhov |
4769 |
Middleman
to Handler 1 |
@brockman |
3970 |
Middleman
to Handler 2 |
@rego |
2625 |
Middleman
to Handler 3 |
@hennebert |
5 |
Fearless
Leader |
@jantke |
MC2.4: How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one?
Sub network employee 100 fits in the city location
structure of the suspected network well. Both employee and handlers are in the
large city. The middleman is in the nearby middle sized city. The fearless
leader is also in the middle sized city with many international
connections. Figure 3 shows the location
distribution pie charts created in Exel to verify the members geographical
information.
Figure 3: Location distribution
of crime A network created in Excel.
MC2.5:
In general, how are the Flitter users dispersed throughout the cities of this
challenge? Which of the surrounding countries may have ties to this criminal
operation? Why might some be of more significant concern than others?
Over half of the flitter users are in the large
cities of Flovania. Only 7% of all the flitter users are not in Flovania. All three other neighbor countries may have
ties to criminal operation. However,
Othello has a slightly stronger connectivity.
The percentage of connections between foreign cities and suspected
members are higher than average for flitter users. Specifically, the three handlers are highly
connected to at least one of the three surrounding countries.
References:
[1] A. Shaverdian, H. Zhou, H.V. Jagadish, G. Michailidis. “Algebraic Visual Analysis: The Catalano
Phone Call Data Set Case Study.” VAKD
09.
[2]The Cytoscape Collaboration. Cytoscape Users Manual. The Cytoscape Collaboration,
Institute for Systems Biology and University of California San Diego, 2006.